Skip to main content

Concurrency & Parallelism in Backend Systems


Why Backend Systems Need Concurrency

  • Every backend system must handle multiple requests simultaneously

  • If a server handles only one request at a time:

    • Other users must wait
    • Leads to poor performance or failures
  • Concurrency helps:

    • Utilize system resources efficiently
    • Handle thousands of users concurrently

Typical Request Lifecycle

  • User → Server → Database → Response

  • Key observation:

    • Server spends significant time waiting for external systems (DB, APIs)

Network Latency Examples

  • Local DB: ~1–2 ms
  • Same region: ~20–30 ms
  • Different region: ~90–100 ms

The Core Problem: Idle CPU

  • While waiting for DB response:

    • CPU does nothing
  • Modern CPU capability:

    • ~3 billion instructions/sec (~3 million per ms)
  • Example:

    • 100 ms wait → 300 million instructions wasted

IO vs CPU Work

IO-Bound Work

  • Waiting for:

    • Database
    • External APIs
    • File system
  • Takes ~70–95% of time in backend systems

CPU-Bound Work

  • Actual computation:

    • Validation
    • JSON parsing
    • Encryption
    • Image processing

Key Insight

  • Typical API call:

    • ~250 ms IO waiting
    • ~10 ms CPU work
  • Result:

    • 95% resource underutilization without concurrency

What is Concurrency?

  • Ability to handle multiple tasks at once (logically)

  • CPU switches between tasks:

    • Start → Pause → Resume

Key Idea

  • While one task waits (IO):

    • CPU works on another task

What is Parallelism?

  • Ability to execute multiple tasks simultaneously (physically)

Requirement

  • Multiple CPU cores

Concurrency vs Parallelism

Concurrency

  • Single CPU core
  • Tasks interleave execution
  • Improves resource utilization

Parallelism

  • Multiple CPU cores
  • Tasks run at same time
  • Improves execution speed

Simple Analogy

  • Concurrency:

    • One chef cooking multiple dishes (switching tasks)
  • Parallelism:

    • Multiple chefs cooking simultaneously

Timeline Understanding (Conceptual)

  • Request A starts → uses CPU → waits (DB)

  • CPU switches to Request B

  • When A’s response returns:

    • CPU resumes A later

Key Point

  • At any moment:

    • Only one task runs (single core)
    • But multiple tasks are in progress

Why This Matters

  • Backend systems are mostly IO-bound

  • Without concurrency:

    • CPU stays idle most of the time
  • With concurrency:

    • CPU is always utilized

When to Use What

Use Concurrency (Most Cases)

  • IO-heavy workloads:

    • DB queries
    • API calls
    • File operations

Use Parallelism

  • CPU-heavy workloads:

    • Image processing
    • Encryption
    • Video encoding

Real-World Backend Behavior

  • Server handles:

    • HTTP requests
    • Logging
    • Background jobs
    • Telemetry
  • All compete for CPU time

  • Concurrency ensures:

    • Efficient scheduling across all tasks

How Concurrency is Implemented

Two main mechanisms:

1. Threads

  • OS-level execution units

  • Each thread:

    • Has its own stack
    • Has instruction pointer
  • Managed by OS scheduler


Thread Scheduling

  • OS assigns time slices (e.g., 2 ms)

  • After time slice:

    • Thread pauses
    • Another thread runs

Preemptive Scheduling

  • Threads are stopped automatically by OS
  • Ensures fairness across tasks

Blocking Behavior

  • When thread hits IO:

    • Marked as blocked
  • OS switches to another thread

  • Once IO completes:

    • Thread becomes runnable again

Memory Model of Threads

Within Same Process

  • Threads share:

    • Heap memory
    • Global variables

Between Processes

  • No shared memory (isolated)

Communication Between Threads

  • Done via shared memory

  • Advantages:

    • Fast (no serialization)
  • Risks:

    • Race conditions
    • Data corruption

Parallelism with Threads

  • If multiple CPU cores:

    • Multiple threads run truly in parallel
  • Improves:

    • CPU-bound performance

Cost of Threads

1. Memory Overhead

  • Each thread:

    • Stack ~KBs to MBs
  • Example:

    • 10,000 threads → several GB memory

2. Creation Overhead

  • Creating thread involves:

    • System call
    • Stack allocation
    • Scheduler registration
  • Takes:

    • Microseconds to milliseconds

Key Takeaways

1. Backend Bottleneck

  • Mostly IO-bound, not CPU-bound

2. Concurrency is Essential

  • Prevents CPU idle time
  • Enables handling many users

3. Parallelism is Situational

  • Useful for heavy computation tasks

4. Threads are Powerful but Expensive

  • High memory + creation cost
  • Need careful management

Mental Model to Remember

  • CPU is valuable → never keep it idle

  • While waiting → do other work

  • Structure program to:

    • Pause IO tasks
    • Resume later
    • Keep CPU busy

If you want next, I can:

  • Explain event loop (Node.js style) vs threads (very important for interviews)
  • Or give code-level intuition (C++ / Go / Node) so it clicks practically